-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Run field extraction concurrently in TS #128643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ff7256a to
fdbf5c9
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| @@ -94,22 +89,14 @@ private PhysicalPlan addFieldExtract( | |||
| attributesToExtract.addAll(extract.attributesToExtract()); | |||
| } | |||
| List<Attribute> attrs = query.attrs(); | |||
| if (keepDocAttribute == false) { | |||
| attrs = attrs.stream().filter(a -> EsQueryExec.isSourceAttribute(a) == false).toList(); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one of the overheads of running them separately. Previously, if all fields were loaded directly in the source operator, we could skip emitting the doc vector. Now, however, we need to emit it for ValuesSourceReaderOperator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement! And there's still more room for parallelism, maybe across backing indices?
|
Thanks Kostas! |
This change extracts the field extraction logic previously run inside `TimeSeriesSourceOperator` into a separate operator, executed in a separate driver. We should consider consolidating this operator with `ValuesSourceReaderOperator`. I tried to extend `ValuesSourceReaderOperator` to support this, but it may take some time to complete. Our current goal is to move quickly with experimental support for querying time-series data in ES|QL, so I am proceeding with a separate operator. I will spend more time on combining these two operators later. With #128419 and this PR, the query time for the example below decreased from 41ms to 27ms. ``` POST /_query { "profile": true, "query": "TS metrics-hostmetricsreceiver.otel-default | WHERE @timestamp >= \"2025-05-08T18:00:08.001Z\" | STATS cpu = avg(rate(`metrics.process.cpu.time`)) BY host.name, BUCKET(@timestamp, 5 minute)" } ```
This change extracts the field extraction logic previously run inside `TimeSeriesSourceOperator` into a separate operator, executed in a separate driver. We should consider consolidating this operator with `ValuesSourceReaderOperator`. I tried to extend `ValuesSourceReaderOperator` to support this, but it may take some time to complete. Our current goal is to move quickly with experimental support for querying time-series data in ES|QL, so I am proceeding with a separate operator. I will spend more time on combining these two operators later. With elastic#128419 and this PR, the query time for the example below decreased from 41ms to 27ms. ``` POST /_query { "profile": true, "query": "TS metrics-hostmetricsreceiver.otel-default | WHERE @timestamp >= \"2025-05-08T18:00:08.001Z\" | STATS cpu = avg(rate(`metrics.process.cpu.time`)) BY host.name, BUCKET(@timestamp, 5 minute)" } ```
This change extracts the field extraction logic previously run inside `TimeSeriesSourceOperator` into a separate operator, executed in a separate driver. We should consider consolidating this operator with `ValuesSourceReaderOperator`. I tried to extend `ValuesSourceReaderOperator` to support this, but it may take some time to complete. Our current goal is to move quickly with experimental support for querying time-series data in ES|QL, so I am proceeding with a separate operator. I will spend more time on combining these two operators later. With elastic#128419 and this PR, the query time for the example below decreased from 41ms to 27ms. ``` POST /_query { "profile": true, "query": "TS metrics-hostmetricsreceiver.otel-default | WHERE @timestamp >= \"2025-05-08T18:00:08.001Z\" | STATS cpu = avg(rate(`metrics.process.cpu.time`)) BY host.name, BUCKET(@timestamp, 5 minute)" } ```
This change extracts the field extraction logic previously run inside `TimeSeriesSourceOperator` into a separate operator, executed in a separate driver. We should consider consolidating this operator with `ValuesSourceReaderOperator`. I tried to extend `ValuesSourceReaderOperator` to support this, but it may take some time to complete. Our current goal is to move quickly with experimental support for querying time-series data in ES|QL, so I am proceeding with a separate operator. I will spend more time on combining these two operators later. With elastic#128419 and this PR, the query time for the example below decreased from 41ms to 27ms. ``` POST /_query { "profile": true, "query": "TS metrics-hostmetricsreceiver.otel-default | WHERE @timestamp >= \"2025-05-08T18:00:08.001Z\" | STATS cpu = avg(rate(`metrics.process.cpu.time`)) BY host.name, BUCKET(@timestamp, 5 minute)" } ```
This change extracts the field extraction logic previously run inside
TimeSeriesSourceOperatorinto a separate operator, executed in a separate driver. We should consider consolidating this operator withValuesSourceReaderOperator. I tried to extendValuesSourceReaderOperatorto support this, but it may take some time to complete. Our current goal is to move quickly with experimental support for querying time-series data in ES|QL, so I am proceeding with a separate operator. I will spend more time on combining these two operators later.With #128419 and this PR, the query time for the example below decreased from 41ms to 27ms.